On Inter-rater Agreement in Audio Music Similarity

نویسنده

Arthur Flexer

چکیده

One of the central tasks in the annual MIREX evaluation campaign is the ”Audio Music Similarity and Retrieval (AMS)” task. Songs which are ranked as being highly similar by algorithms are evaluated by human graders as to how similar they are according to their subjective judgment. By analyzing results from the AMS tasks of the years 2006 to 2013 we demonstrate that: (i) due to low inter-rater agreement there exists an upper bound of performance in terms of subjective gradings; (ii) this upper bound has already been achieved by participating algorithms in 2009 and not been surpassed since then. Based on this sobering result we discuss ways to improve future evaluations of audio music similarity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Problem of Limited Inter-rater Agreement in Modelling Music Similarity

One of the central goals of Music Information Retrieval (MIR) is the quantification of similarity between or within pieces of music. These quantitative relations should mirror the human perception of music similarity, which is however highly subjective with low inter-rater agreement. Unfortunately this principal problem has been given little attention in MIR so far. Since it is not meaningful t...

متن کامل

Human Similarity Judgments: Implications for the Design of Formal Evaluations

This paper presents findings of a series of analyses of human similarity judgments from the Symbolic Melodic Similarity, and Audio Music Similarity tasks from the Music Information Retrieval Evaluation Exchange (MIREX) 2006. The categorical judgment data generated by the evaluators is analyzed with regard to judgment stability, inter-grader reliability, and patterns of disagreement, both within...

متن کامل

Test-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children

Objectives: Reliable and valid clinical tools to screen, diagnose, and describe eating functions and dysphagia in children are highly warranted. Today most specialists are aware of the role of assessment scales in the treatment of affected individuals. However, the problem is that the clinical tools used might be nonstandard, and worldwide, there is no integrated assessment performed to assess ...

متن کامل

Glioblastoma Segmentation: Comparison of Three Different Software Packages

To facilitate a more widespread use of volumetric tumor segmentation in clinical studies, there is an urgent need for reliable, user-friendly segmentation software. The aim of this study was therefore to compare three different software packages for semi-automatic brain tumor segmentation of glioblastoma; namely BrainVoyagerTM QX, ITK-Snap and 3D Slicer, and to make data available for future re...

متن کامل

Nurse-Physician Agreement on Triage Category: A Reliability Analysis of Emergency Severity Index

Background and Objectives: MThe Emergency Severity Index (ESI) triage is commonly used in clinical settings to determine the patients’ emergency severity. However, the reliability of this index is not sufficiently explored. The present study examines the inter-rater reliability of ESI by comparing triage ratings as performed by nurses and physicians. Methods: This prospective cross-sectional st...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

On Inter-rater Agreement in Audio Music Similarity

نویسنده

چکیده

منابع مشابه

The Problem of Limited Inter-rater Agreement in Modelling Music Similarity

Human Similarity Judgments: Implications for the Design of Formal Evaluations

Test-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children

Glioblastoma Segmentation: Comparison of Three Different Software Packages

Nurse-Physician Agreement on Triage Category: A Reliability Analysis of Emergency Severity Index

عنوان ژورنال:

اشتراک گذاری